Fast and Exact Mining of Probabilistic Data Streams

نویسندگان

  • Reza Akbarinia
  • Florent Masseglia
چکیده

Discovering Probabilistic Frequent Itemsets (PFI) is very challenging since algorithms designed for deterministic data are not applicable in probabilistic data. The problem is even more difficult for probabilistic data streams where massive frequent updates need to be taken into account while respecting data stream constraints. In this paper, we propose FEMP (Fast and Exact Mining of Probabilistic data streams), the first solution for exact PFI mining in data streams with sliding windows. FEMP allows updating the frequentness probability of an itemset whenever a transaction is added or removed from the observation window. Using these update operations, we are able to extract PFI in sliding windows with very low response times. Furthermore, our method is exact, meaning that we are able to discover the exact probabilistic frequentness distribution function for any monitored itemset, at any time. We implemented FEMP and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results illustrate its very good performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FMU: Fast Mining of Probabilistic Frequent Itemsets in Uncertain Data Streams

Discovering Probabilistic Frequent Itemsets (PFI) in uncertain data is very challenging since algorithms designed for deterministic data are not applicable in this context. The problem is even more difficult for uncertain data streams where massive frequent updates need be taken into account while respecting data stream constraints. In this paper, we propose FMU (Fast Mining of Uncertain data s...

متن کامل

Calculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms

The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Frequent Sets Mining in Data Stream Environments

In recent years, data streams have emerged as a new data type that has attracted much attention from the data mining community. They arise naturally in a number of applications (Brian et al., 2002), including financial service (stock ticker, financial monitoring), sensor networks (earth sensing satellites, astronomic observations), web tracking and personalization (webclick streams). These stre...

متن کامل

Synopsis Construction in Data Streams

Unlike traditional data sets, stream data flow in and out of a computer system continuously and with varying update rates. It may be impossible to store an entire data stream due to its tremendous volume. To discover knowledge or patterns from data streams, it is necessary to develop data stream summarization techniques. Lots of work has been done to summarize the contents of data streams in or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013